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IN THE UNITED STATES 
PATENT AND TRADEMARK OFFICE 

Declaration and Power of Attorney 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am an original, first and joint inventor of the subject matter which is claimed 
and for which a patent is sought on the invention entitled MULTIPROCESSOR SYSTEM 
WITH CACHE-BASED SOFTWARE BREAKPOINTS the specification of which is attached 
hereto. 

I hereby state that I have reviewed and understand the contents of the above identified 
specification, including the claims, as amended by an amendment, if any, specifically referred to 
in this oath or declaration. 

I acknowledge the duty to disclose all information known to me which is material to 
patentability as defined in Title 37, Code of Federal Regulations, 1.56. 

I hereby claim foreign priority benefits under Titie 35, United States Code, 1 19 of any 
foreign application(s) for patent or inventor's certificate listed below and have also identified 
below any foreign application for patent or inventor's certificate having a filing date before that 
of the application on which priority is claimed: 

None 

I hereby claim the benefit under Title 35, United States Code, 120 of any United States 
application(s) listed below and, insofar as the subject matter of each of the claims of this 
application is not disclosed in the prior United States application in the manner provided by the 
first paragraph of Title 35, United States Code, 112, I acknowledge the duty to disclose all 
information known to me to be material to patentability as defined in Titie 37, Code of Federal 
Regulations, 1 .56 which became available between the filing date of the prior application and the 
national or PCT international filing date of this application: 

None 

I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and fiirther that these 
statements were made with the knowledge that willfiil false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Titie 1 8 of the United States 
Code and that such willfixl false statements may jeopardize the validity of the application or any 
patent issued thereon. 
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I hereby appoint the following attomey(s) with foil power of substitution and revocation, 
to prosecute said application, to make alterations and amendments therein, to receive the patent, 
and to transact all business in the Patent and Tradem^k Office connected therewith: 



Lester H. Bimbaum 
Richa-d J. Botos 
Gerard A. deBlasi 
Anthony Grillo 
Mark A. Kurisko 
Robert P. Marley 
Scott W. McLellan 
Geraldine Monteleone 
Scott J. Rittman 
Ferdinand M. Romano 
David L. Smith 
John P. Veschi 



(Reg. No. 

(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 
(Reg. No. 



25830) 
32016) 
34149) 
36535) 
38944) 
32914) 
30776) 
40097) 
39010) 
32752) 
30592) 
39058) 
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I hereby appoint the attomey(s) on ATTACHMENT A as associate attomey(s) in the 
aforementioned application, with full power solely to prosecute said application, to make 
alterations and amendments therein, to receive the patent, and to transact all business in the Patent 
and Trademark Office connected with the prosecution of said application. No other powers are 
granted to such associate attomey(s) and such associate attomey(s) are specifically denied any 
power of substitution or revocation. 



Full name of 1 st joint inventor: Michael Richard Betker 



Date -l-hkoo^ 



Mventofs signature 
Residence: Allentown, Lehigh County, Pennsylvania 
Citizenship: United States of America 

Post Office Address: 6 66 Cedai Hills Diivc ■ /L /I / 

All ^town, rcmisylvaiiia 18103 ^///zoox 

Full name of 2nd joint inventor: Han Q. Nguyen 



Inventor's signature iO'^^a^ aJ^C^^^ ^ "i^s^-n^^ D at e 1^ 

Residence: Maribefo, B^fegtOTr6(Jtm^N ^ - J ersey - 

Citizenship: United States of America 

Post Office Address: 2 ^arp Lan e 

M a rlboroy New Jersey 07746 jl^CitU 
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Full name of 3rd joint inventor: Bryan Schlieder 

Inventor's signature ^j'U^t^^C^^ -->^^D at e ^llj^OOT^ 

Reisidence: Bethlehem, Ndrthampton County, Pennsylvania 

Citizenship: United States of America 

Post Office Address: 2935 Hecktown Road 

Bethlehem, Pennsylvania 18020 

Full name of 4th joint inventor: Shaun Patrick Whalen 

Inventor's signature ^^^^^^^^ /^g^^^j^^^^^^^^C^ Dat e Sj^J J^GPJ^ 

Residence: Wescosville, Lehigh County, Pennsylvania 

Citizenship: United States of America 

Post Office Address: 5301 Hanover Drive 

Wescosville, Pennsylvania 1 8 1 06 

Full name of 5th joint inventor: Jay Patrick Wilshire 
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Inventor's signature _ 

t^U^UU!^ /tdufe^^ D ate ^/f /ZO0Z^ 

Residence: Pennsburg, Bucks County, Pennsylvania 

Citizenship: United States of America 

Post Office Address: 2030 Miller Road 

Pennsburg, Pennsylvania 18073 
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ATTACHMENT A 



Attorney Naine(s): Joseph B. Ryan Reg. No. 37922 

Kevin M. Mason Reg. No. 36597 

William E. Lewis Reg. No. 39274 

Robert J. Mauri Reg. No. 41 180 

Wayne L. Ellenbogen Reg. No. 43602 

James M. Loeffler Reg. No. 37873 

James F. Harrington Reg. No. 44741 



Telephone calls should be made to Joseph B. Ryan of Ryan, Mason & Lewis, LLP at: 

Phone No.: (516)759-7517 
Fax No.: (516)759-9512 

All written communications are to be addressed to: 

Ryan, Mason & Lewis, LLP 

90 Forest Avenue 

Locust Valley, New York 1 1560 
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MULTIPROCESSOR SYSTEM 
WITH CACHE-BASED SOFTWARE BREAKPOINTS 

Field of the Invention 

The present invention relates generally to multiprocessor integrated circuits and other types 
of systems wliich include multiple processors, and more particularly to techniques for implementing 
software bre^oints in such systems. 

Background of the Invention 

Software breakpoints are typically implemented as assembly instructions embedded in a 
program. A given one of these assembly instructions when encountered in the program causes 
execution of the program to transfer to a debugger. As is well known, the use of designated 
breakpoints in conjunction with the debugger allows a programmer to inspect the state of the 
program in order to iBx a problem or better understand program behavior. 

In the case of a single conventional processor, a software breakpoint may be embedded in 
a program by overwriting an existing instruction at the desired breakpoint location with an assembly 
instruction in the form of a specified debug operation code, also referred to as an opcode. The 
overwritten instruction is stored by the debugger so that it may subsequently replace the inserted 
opcode upon resumption of normal operation, as will be described below. Examples of conventional 
debug opcodes include opcodes referred to as DEBUG, TRAP, etc. These debug opcodes generally 
vary depending on the processor. 

After a given breakpoint is taken and the programmer is ready to continue with normal 
execution of the program, a two step process typically takes place. First, the debugger replaces the 
debug opcode with the previously-overwritten existing instruction, so that this instruction alone may 
be executed (or "stepped"). Then the debug opcode is restored at the desired breakpoint location so 
that the next time program execution reaches this point, another breakpoint will occur. 

Many high performance processors include an instruction cache. An instruction cache 
enables the processor to store frequently accessed instructions in a fast local memory that adapts its 
contents to the executing program. 



Betker 7-1-3-12-5 

Amultiprocessor system generally includes multiple processors each connected to a common 
bus. Also connected to the common bus is a shared main memory that stores instructions and data 
for use by the processors. In such a system, each of the processors will typically have its own 
instruction cache. These instruction caches are particularly important in multiprocessor systems in 
that the caches serve to significantly reduce the bus traffic associated with the instruction and data 
fetches of the multiple processors. 

The conventional single-processor software breakpoints described above generally do not 
work in a multiprocessor system. More particularly, in a multiprocessor system, once a particular 
one of the processors has taken a software breakpoint, the corresponding actual instruction must be 
written to the shared memory so liie stopped processor can subsequently fetch and execute it. 
However, while the actual instruction is in shared memory, the other processors may fetch and 
execute it, thus missing the breakpoint. Thus, in a shared-memory multiprocessor system, the 
programmer utilizing conventional single-processor software breakpoint techniques would be likely 
to observe erratic behavior, e.g., sometimes a given processor would stop at a software breakpoint 
and other times it would not. 

Examples of known multiprocessor systems include the DSP16410 and DSP16270 shared- 
memory multiprocessor digital signal processors (DSPs) available from Agere Systems Inc. of 
AUentown, Pennsylvania, USA. Each of these systems includes two processors, denoted DSPO and 
DSPl. These processors do not have instruction caches, but can execute from a common shared 
memory. Consider a case where a software breakpoint has been set in the shared memory and 
executed by DSPO. In order to resimie normal execution after the breakpoint is taken, the actual 
instruction code at the breakpoint location should be placed in local memory that is private to DSPO. 
This enables DSPO to execute the actual instruction without preventing DSPl from hitting the 
desired breakpoint. Implementing multiprocessor software breakpoints on the DSP16410 or the 
DSP 16270 therefore may require local instruction memory, which adds to the cost of the device. 
In addition, rebuilding the executable code so that the breakpoint debug opcode is moved to the 
address space corresponding to the local mCTiory can be inconvenient and time consuming for the 
programmer. These and other similar problems arise in a wide variety of other conventional 
multiprocessor systems. 
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A need therefore exists for improved techniques for implementing software breakpoints in 
shared-memory multiprocessor systems, so as to avoid one ormore of the above-described problems. 

Smnmarv of the Invention 

The invention provides improved techniques for implementing software breakpoints in a 
shared-memory multiprocessor system. 

In accordance with one aspect of the invention, a software breakpoint is unplemented in a 
multiprocessor system having a number of processors each coupled to a main memory. Each of the 
processors preferably has an mstraction cache associated therewith. An instruction for which a 
breakpoint is to be inserted is retrieved from a corresponding instruction address in the main 
memory, and a breakpoint code, e.g., a debug opcode, is inserted at the instruction address in main 
memory. 

The mvention may be implemented using a debugger which communicates with the 
multiprocessor system via an interface. The debugger saves the retrieved instruction that has been 
replaced with the breakpoint code, and when the breakpomt code is executed by a given one of the 
processors, a breakpoint is taken and the debugger takes control of the given processor. Before the 
debugger resumes execution, it stores the retrieved instruction in the instruction cache for the given 
processor, and sets a use-once indicator associated with the instruction as stored in the corresponding 
instruction cache for that processor. 

The use-once indicator, when set for the instruction as stored in the instruction cache, is 
operative via cache control logic to clear a validity indicator associated with the instruction after a 
single fetch of the instruction from the instruction cache. As a result, subsequent attempts by the 
given processor to access the instruction as stored in the instruction cache will cause the processor 
to retrieve the breakpoint code at the instruction address in main memory. 

Advantageously, the software breakpoint techniques of the present invention ensure that a 
specified debug opcode or other breakpoint code can remain present in the main memory of the 
multiprocessor system at all times, such that all of the processors of the system will reliably fetch 
and execute that breakpoint code even if one or more of these processors are resuming execution 
from the brealq)oint. 
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Brief Description of the Drawing s 

FIG. 1 is a block diagram of an example multiprocessor system in which the present 
invention is implemented. 

FIG. 2 illustrates the operation of an instruction cache of the FIG. 1 multiprocessor system 
in accordance with the invention. 

FIG. 3 illustrates the interaction of the FIG. 1 multiprocessor system with a debugger. 

FIG. 4 shows a state and flow diagram illustrating the operation of a multiprocessor system 
in accordance with the invention. 

FIG. 5 is an example illustrating the operation of a three-processor implementation of a 
multiprocessor system in accordance with the invention. 

Detailed Description of the Invention 

The invention will be illustrated herein in conjunction with an exemplary shared-memory 
multiprocessor system. It should be understood, however, that the invention is more generally 
applicable to any shared-memory multiprocessor system in which it is desirable to provide improved 
performance through the use of software bre£jq)oints. The term "multiprocessor system" as used 
herein is mtended to include any device in which retrieved instructions are executed using one or 
more processors. The term "processor^' is intended to include, by way of example and without 
limitation, microprocessors, central processing units (CPUs), very long instruction word (VLIW) 
processors, single-issue processors, multi-issue processors, digital signal processors (DSPs), 
application-specific integrated circuits (ASICs), and other types of data processing devices, as well 
as portions and combinations of these and other devices. 

FIG. 1 shows a shared-memory multiprocessor system 100 m which the present invention 
is implemented. The system 100 includes a number of processors 102-1, 102-2, . . . 102-N, each 
coupled to a main memory 1 04 via a common bus 1 06. Also associated with each of the processors 
102-z, z = 1, 2, . . . N, is a corresponding multiplexer 108-/ and instruction cache (Icache) UO-z. In 
operation, the processors 102 each retrieve instructions for execution, as well as associated data, 
fi:om the main memory 1 04. The instructions may be retrieved directly fi-om the main memory 1 04 
via a first path through the multiplexer 108, or firom the instruction cache 1 10 via a second path 
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through the multiplexer 108. The processors 1 02 also return instruction execution results to the main 
memory 104 via bus 106, although separate return paths from the processors to the bus 106 are not 
exphcitly shown in the figure. These retum paths may be combined in whole or in part with the 
illustrative signal lines used to retrieve the instructions and data, i.e., bidirectional signal lines may 
be used. 

Also included in the multiprocessor system 1 00 is a set of cache control logic 112. Although 
not explicitly shown as such in the figure, the cache control logic 112 may be coupled to each of the 
instruction caches 110. Altematively, the cache control logic 112 may be distributed across the 
instruction caches 110 such that a portion of the cache control logic 1 12 forms an internal portion 
of each of the instruction caches 110. Other arrangements may also be used, e.g., various 
combinations of centralized and distributed logic. The operation of the cache control logic 112 will 
be described in greater detail below in conjunction with the description of FIGS. 2 through 5. 
Suitable hardware or software elements, or various combinations thereof, may be configured in a 
straightforward manner to provide the cache control logic functionality described herein, as will be 
appreciated by those skilled in the art. The cache control logic 1 12 may therefore be implemented 
in hardware, in software, or as a combination of hardware and software. 

In accordance willi the present invention, the instruction caches 1 1 0 of the FIG. 1 system are 
configured so as to permit a debugger to place and mark an existing instruction, for which a 
breakpoint is to be established, in a given one of the instruction caches of a particular processor. 
Advantageously, the placing and marking of the instruction ensure that the other processors of the 
system will not miss the breakpoint when the particular processor is resumed. The illustrative 
embodiment of the invention implements this capability in part through the addition of a control bit, 
refrared to herein as ause-once (U) bit, to each set of instruction information stored in the instruction 
cache, and by permitting the debugger to write this information directly to the instruction cache, 

FIG. 2 illustrates the instruction processing in a given one of the instruction caches 1 10 of 
the FIG. 1 system, i.e., a particular instruction cache 1 lO-z, although it should be understood that 
substantially the same instruction processing is utilized in each of the instruction caches 1 1 0. In the 
illustrative embodiment, when the corresponding processor 102-z fetches an instruction from the 
instruction cache 1 lO-z, it presents an address 200 to the cache. The address 200 as shown in FIG. 
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2 includes a tag field 202, an index field 204, and a block offset field 206. The address 200 in this 
example is a 32-bit address, with bits 0-3 corresponding to the block offset, bits 4-11 corresponding 
to the index, and bits 12-31 corresponding to the tag. 

It should be emphasized that this particular address arrangement is by way of example only, 
and other address lengths and field configurations may be used. In other words, the invention does 
not require that the presented address have a particular format. 

The index field 204 of the address 200 is used to select one of several sets in the instruction 
cache 110-/. More particularly, the particular value of the index field 204 specifies a p^cular set 
210 within a group of such sets, where each of the sets corresponds generally to a set of instructions 
stored in the cache. Each instruction in the set 210 includes a valid (V) bit, the above-mentioned 
use-once (U) bit, a tag field, and the instruction data. 

The instruction cache 1 1 0-/ in accordance with the invention includes logic elements 212 and 
2 1 4, which may be implemented in a straightforward maimer well-known to those skilled in the art. 
i As shown in the figure, logic element 212 compares the tag field 202 ofthe presented address 200 
to the tag fields of the set 210 as selected using the mdex field 204, and logic element 214 perfomis 
an AND operation on the output of the logic element 212 and the valid bit. The output of logic 
element 214 is at a logic high level if for a given instruction in the set 210 the tag field 202 matches 
, its tag field and its vahd bit is set, i.e., at a logic high level. In this situation, the cache is said to 
"hit" at the instruction address 200, and otherwise to "miss" at the instruction address 200. 

In normal operation absent any inserted software breakpoints, if there is a match between the 
tag field 202 of the address and the tag field of a given one of the instructions, and the valid bit is 
set for the given instruction, then the desired instruction data is in the cache 1 lO-i and is returned to 
the corresponding processor 102-f. If there is no match or the valid bit is clear, the desired 
instruction data is not in the cache 1 1 0-/, and therefore must be fetched fi-om the main memory 1 04. 

As indicated above, the use-once bit is utilized to implement software breakpoints in an 
advantageous manner in accordance with the techniques ofthe invention. Such breakpoints may be 
inserted using an arrangement such as that shown in FIG. 3. In this particular software brealq)oint 
processing configuration 300, a debugger 302 interacts via an interface 304 with the multiprocessor 
100 of FIG. 1. The debugger 302, which may also be referred to as a debugger tool, may be 
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implemented using E^propriately-configured hardware and software elements, as is well known. The 
interface 304 may also be implemented in manner well understood by those skilled in the art, and 
is therefore not described in detail herein. It is to be appreciated that the invention does not require 
the use of any particular type of debugger or multiprocessor system interface. The debugger will 
typically run on a processor that is not an element of the system 100, but this is by way of example 
and not limitation. 

In order to program a software breakpoint for a given processor of the multiprocessor system 
100, a programmer utilizes the debugger 302 of FIG. 3 to overwrite an existing instruction at the 
desired breakpoint location in main memory 104 with a specified debug opcode. However, the 
existing instruction to be overwritten is first fetched firom main memory 104 and saved by the 
debugger for subsequent storage in one or more of the instruction caches 1 10 upon resumption of 
execution from a taken breakpoint, as will be described in greater detail below. 

When a given processor 102-/ executes the inserted debug opcode from the main memory 
104, the brea]q)oint is said to be taken by that processor, and execution of the program is transferred 
to the debugger 302, e.g., via a designated exception, transition to a debug operation mode, etc. 
After the breakpoint is taken by processor 102-i and the programmer is ready to continue with 
normal execution of the program, the debugger 302 writes the instruction data portion of the 
appropriate element in the cache set 210 with the corresponding information from the actual 
instruction for which the brealq)oint was taken, and sets tiie use-once bit for that element in the cache 
set 210. 

The processor 102-z resumes execution of the program after the breakpoint is taken by 
executing a fetch request using the address of the actual instruction for which the breakpoint was 
taken. This fetch request is serviced from the instruction cache 110-z. However, immediately after 
servicing this request, and in accordance with the techniques of the invention, the set use-once bit 
causes the cache control logic to clear the valid bit for that element of the set 210. Thus, the next 
time the processor 1 02-z issues a fetch request for this instruction address, the cache will "miss" and 
the debug opcode will automatically be fetched again from main memory 104. 
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Advantageously, the above-described mechanism ensures that the desired debug opcode is 
always present in main memory 104, such that all of the processors 102 will reliably fetch and 
execute it even if one or more of these processors are resuming execution from the breakpoint. 

It should be noted lhat it is common to associate attributes with instruction addresses to 
control the handling of the instruction fetch. For example, one known attribute is the so-called 
"noncacheable" attribute. This attribute indicates whether a particular instruction address should be 
allowed to be stored in the instruction cache. Each of the processors 1 02 typically includes hardware 
for interpreting such attributes. For an instruction address with the noncacheable attribute, the cache 
control logic always treats a fetch to this address as a miss. 

The above-described use-once bit, however, changes the handUng of the noncacheable 
attribute so that a software breakpoint can be set on a noncacheable instruction. For a software 
breakpoint on a noncacheable instruction, a given processor 102-z fetches the breakpointed 
instruction, but does not put the instruction in its instruction cache. After the brealqpoint is taken by 
processor 1 02-z and the programmer is ready to contmue with normal execution of the program, ttie 
debugger 302 writes the instruction data portion of the appropriate element in the cache set 21 0 with 
the corresponding information from the actual noncacheable instruction for which the breakpoint 
was taken, and sets the use-once bit for that element in the cache set 210. The set use-once bitcauses 
the cache control logic to fetch this instruction address from the cache even though its attribute says 
that it is noncacheable. Once the fetch is serviced from the cache, the valid bit of the instruction is 
cleared as previously described. Thus, the use-once bit mechanism of the present invention allows 
software breakpoints to be set on both cacheable and noncacheable instructions, and to be taken by 
a given processor without impacting other processors in the multiprocessor system 100. 

FIG. 4 shows a state and flow diagram 400 which illustrates in greater detail the above- 
described cache-based software breakpoint process as implemented in multiprocessor system 100. 
The diagram 400 includes an IDLE state 402, a USE ONCE state 404, and a MISS state 406. 

The system is initially in the IDLE state, e.g., upon execution of a reset command as shown. 
From the IDLE state, a determination is made in step 410 as to whether an instruction fetch has been 
requested by a given processor 102 of the system. If not, the process remains in the IDLE state. 
Otherwise, step 412 determines if there has been an instruction cache hit for the given processor. 

8 
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If there is no instruction cache hit in step 412, the given processor is stalled and the instruction 
request is directed to the main memory 104 of the system, as indicated in step 414, and the process 
then enters the MISS state. If there is an instruction cache hit in step 412, step 416 determines if the 
USE ONCE mode is set, i.e., determines if the use-once bit for the instruction cache is set. If the 
use-once bit is set, the given processor is stalled and the vahd and use-once bits of the instruction 
cache are cleared, as indicated in step 418, and the process then enters the USE ONCE state. If the 
use-once bit is not set in step 416, the instruction fetch data is retumed from the instruction cache 
as indicated in step 420. 

From the USE ONCE state, the process in step 422 removes the processor stall that was 
implemented in step 418, and proceeds to step 420 to return the instruction fetch data from the 
instruction cache. 

From the MISS state, the process in step 424 determines if the instruction fetch data has been 
retumed from the main memory. If the instruction fetch data has not been retumed, the process 
remains in the MISS state. If the instruction fetch data has been retumed, it is retumed to the 
requesting processor in step 426, and the process returns to the IDLE state. 

At least a portion of the operations described in conjunction with FIG. 4 can be implemented 
using the cache control logic 112 of FIG. 1. For example, the cache control logic 112 may be 
configured to clear the valid bit in step 418 based on the determination of the state of the use-once 
bit in step 416. As indicated previously, suitable arrangements of hardware, software or both for 
accomplishing these and other operations in the cache control logic 1 12 will be readily apparent to 
those skilled in liie art. 

It is also to be appreciated that the particular processing operations shown in FIG. 4 are by 
way of example only, and should not be constmed as limitations of the invention. Those skilled in 
the art will recognize that the techniques of the invention can be implemented using other 
arrangements of processing operations. 

FIG. 5 is a more particular example illustrating the operation of a three-processor 
implementation of the multiprocessor system 100 in accordance with the invention. The three 
processors are denoted PO, P 1 and P2 for purposes of this example. The figure includes portions (a) 
through (g) which represent the changes in the state of the three-processor system 100 as software 
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breakpoints are taken with the use-once mechanism described above. For each of the portions (a) 
through (g), the states of the three processors are shown. Each line indicates a different aspect of 
the state of the corresponding processor. The Pi state indicates whether the corresponding processor 
is in a debug state or a nondebug state. These states are also referred to as modes. The other states 
utilized in the figure are as follows: 

Pz cache[. , .] V state indicates the state of the vaUd bit for a specific set in the cache for Pi. 
Pz cache[. . .] U state indicates the state of the use-once bit for a specific set in the cache for 

P/. 

Pz cache[. . .] Tag state indicates the state of the tag part for a specific set in the cache for Pf. 
Pz cache[. . .] Data state indicates the state of the data part for a specific set in the cache for 

Pz. 

Pz cache[. . .] V state indicates the state of the valid bit for a specific set in the cache for Pz. 

The expression index(x) denotes the index field 204 of the address of x. The expression 
cached] denotes the set of cache 210 addressed hyy. The expression tag(z) indicates the tag field 
202 of the address of z, 

Li (a) all three processors are executing operational code and are not in the debug mode. 

In (b), PC has fetched an instruction on which a software breakpoint has been set. The 
instruction on which the software breakpoint is set is indicated by swbpt. This instruction is stored 
in the cache in the set index(swbpt). After this instruction has been fetched into the cache, the valid 
bit V of this set is 1 and the use-once bit U of this set is 0. The tag 202 and data parts of this set are 
written with the tag part of the address and the actual data fetched from main memory 1 04. The data 
for this instruction is the DEBUG opcode. 

In (c), PO has executed the DEBUG opcode and entered its debug mode. The state of the 
cache set holding the DEBUG instruction is the same as m (b). 

In (d), PO is ready to exit its debug mode and resume normal operation. The debugger 302 
has written the use-once bit to 1 and the data part of the cache set to the actual instruction that was 
replaced by a DEBUG opcode in main memory 104. Meanwhile, P2 has fetched the same 

10 
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instruction on which the software brealq)oint is set. P2 fetches the instruction from main memory 
104, which is still the DEBUG opcode, and writes it into its instruction cache, setting the valid bit, 
tag, and data parts of the corresponding cache set. This state illustrates how the use-once logic 
allows one processor to execute the actual instruction on which the software breakpoint is set without 
interfering with another processor hitting the breakpoint. 

In (e), PO has resumed normal operation. When PO resumed, it addressed the cache with the 
address of swbpt Since the V and U bits for this set are 1, the cache returned the bpopcode as the 
fetched instruction and cleared the V and U bits. P2 has executed the DEBUG opcode and entered 
debug mode. 

In (f), PO continues normal operation. P2 is ready to exit its debug mode and resume normal 
operation. The debugger 302 has written the use-once bit to 1 and the data part of the cache set to 
the actual instruction that was replaced by a DEBUG opcode in main memory 104. 

In (g), P2 has resumed normal operation. When P2 resumed, it addressed the cache with the 
address of swbpt. Since the V and U bits for this set are 1 , the cache returned the bpopcode as the 
fetched instruction and cleared the V and U bits. PO is still executing normally. 

The foregoing example is intended for purposes of illustration only, and should not be 
viewed as limiting the scope of the invention in any way. 

Advantageously, the invention provides substantially unproved software breakpoint 
capability in a multiprocessor system with little or no impact on processor cycle time, minimal 
hardware cost, and without requiring program alteration. Importantly, the action of clearing the valid 
bit in response to the set use-once bit does not take place during the lookup process where the index 
is used to find a set and compare the tags. Rather, tiie action of clearing the valid bit takes place after 
the hit determination is compete. In addition, there is no need for private local memory in each 
processor since the existing instruction cache memory is used. Furthermore, the executable code of 
a given user program can be used as is. There is no need to relocate it to private local memory so 
that the debug opcode can be managed. 

It should also be noted that the invention makes the debugger more efficient when resuming 
fi:om a breakpoint. With the invention, the debugger no longer has to perform the previously- 
described two-step process of "stepping" the actual instruction and then replacing the debug opcode. 
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More particularly, the invention eliminates the latter of these two steps, which is a significant 
performance advantage for the debugger. 

Although illustrated usmg an actual instruction cache, the invention can be implemented at 
least in part utilizing dedicated hardware which provides the functionaUty described above but does 
5 not otherwise serve as an actual instruction cache in normal operation of a multiprocessor system. 
In other words, the dedicated hardware is configured substantially as an instruction cache but is used 
only in implementing breakpoints in accordance with the techniques of the invention. The term 
"instruction cache" as used herein is intended to include this type of dedicated hardware. 

The present invention may be configured to meet the requirements of a variety of different 
1 il processing applications and environments, using any desired types and arrangements of processor, 
51 mstruction cache, bus and maia memory elements. The above-described embodiments of the 
invention are therefore mtended to be illustrative only. For example, although the ilhistrative 
fij embodiment utilizes a single-bit use-once indicator for breakpointed instructions stored m an 
^ instruction cache, multiple-bit indicators can also be used, as can other arrangements of different 
Ip types of single-bit ormultiple-bit indicators. In addition, the particular manner in which the a use- 
a once indicatoris updatedmay be altered. Furthermore, the breakpoint code, instruction, instruction 
11 address and cache set configurations may be varied as required to accommodate a given processing 
U application or environment. These and numerous other alternative embodiments within the scope 
of the following claims will be apparent to those skilled in the art. 
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